In this cheatsheet we summarize the main R functions that are used in the SNA4DS course. We will not explain the underlying concepts here, but refer you to the lectures, labs, and slides of the course for that.

The aim of this cheatsheet is that it provides you with an overview of the main functions you will need throughout the course. We hope that it can provide a useful reference for you, as you develop and apply your network analysis skills.

NOTE:

Most functions have multiple arguments. Our aim is in this cheatsheet not to show and discuss the various arguments that exist, because that would yield an unwieldy and very long document. Rather, we recommend you use your R skills and use the help function ? and help and other approaches we teach you in this course to learn about the details of a specific function. If you still can’t figure it out, contact us and we’ll assist you.



1 Main packages in the SNA4DS course

1.1 Overview of igraph and network graph objects

There are two main packages for basic graph generation and manipulation: the igraph package and the statnet package. Actually, statnet is a suite of packages that work together. In this course, we will will make use of several packages from the statnet suite.

The igraph package creates a graph object of type igraph. The statnet suite creates a graph object of type network. There are many things you can do in both packages. Both packages can generate graphs and do basic manipulation, so here you should just use the package whose API you like best. The igraph package provides more mathematical functions to apply to the graph data and the statnet suite provides loads of statistical models that the igraph package does not do.

1.2 The snafun package

The igraph package and statnet suite are jointly very powerful and can support much of your analyses of network data. However, as you read above, they each require graph objects that have specific structures and they can’t deal with a graph object that has a different structure. So, if you want to use the functions from both the igraph and the sna packages, you need network data in igraph format (for the igraph package), in network format (for the network package and some of the sna package) and in matrix format (for many of the functions in the sna package). In other words, you will have to convert your data between these formats and you also have to deal with the differing API’s between these various packages.

Believe it or not, this is a pain and quite annoying.

THE snafun PACKAGE TO THE RESCUE!

The snafun package does three things:

  • First, it provides an (fairly) consistent API, so you don’t have to constantly figure out what a specific argument means for each function;
  • Second, most of the functions in the snafun package work on both objects of class igraph or network. As a result, you can do what you want to do, without bothering with whether the object you work on is of class igraph or network.
  • Third, by removing the pain coming from the constant switching between the two groups of packages and their inconsistent API, you can now actually focus on the fun of network analysis, rather than the frustration.

Oh, and there is a fourth advantage too: the authors of the snafun package are cool people. So, if you have the need for a new function in the package, just get in touch with us and we’ll see what we can do for you.


2 Creating a graph object

This is how you create graph objects of class igraph or network.

Graph creation
Create an empty network
snafun
snafun::create_empty_graph(n_vertices, directed = TRUE, graph = c("igraph", "network"))
igraph
igraph::make_empty_graph()
network
network::network::initialize()
Create a manual (literal) network
snafun
snafun::create_manual_graph(A -+ B -+ C, simplify = TRUE, graph = 'igraph')
igraph
igraph::graph_from_literal(A -+ B -+ C, simplify = TRUE)
network
Create a ring network
snafun
igraph
igraph::make_ring()
network
Create a star network
snafun
igraph
igraph::make_star()
network
Create a random graph with given density
snafun
snafun::create_random_graph(
    n_vertices = 10,
    strategy = "gnm",   # to fix the number of edges
    m = 27,             # number of edges required, resulting density = .3
    directed = TRUE,    # or FALSE
    graph = c("igraph", "network"))   # pick one

snafun::create_random_graph(
    n_vertices = 10,
    strategy = "gnp",   # to fix the probability edges
    p = .3,             # probability for each edge, yields approx. density of .3
    directed = TRUE,    # or FALSE
    graph = c("igraph", "network"))
igraph
igraph::sample_gnm()
igraph::sample_gnp()

igraph::make_directed_graph(n = 10, edges = 27)
igraph::make_undirected_graph(n = 10, edges = 27)
network1
# 10 vertices, density on average .3
sna::rgraph(n = 10, m = 1, tprob = 0.30, mode = 'digraph')

# 10 vertices, 27 edges (ie. density = .3)
sna::rgnm(1, 10, m = 27)
Create a random graph with given dyad census
snafun
create_census_graph(n_vertices = 10, mut = 8, asym = 25, null = 12, 
                                   method = 'exact',
                                   graph = 'igraph')
igraph
network1
# 5 networks, each with 10 vertices, and 8 M, 25 A, and 12 N dyads
sna::rguman(n = 5, nv = 10, mut = 8, asym = 25, null = 12, 
  method = "exact")
Create a random graph with given community struture
snafun
snafun::create_community_graph(
  communitySizes = c(10, 20, 30),
  p_intra = c(0.3, 0.2, 0.3),
  p_inter = 0.2,
  p_del = 0,
  graph = 'igraph')
igraph
network
Create a graph with given components
snafun
create_components_graph(
  n_vertices,
  directed = FALSE,
  membership = NULL,
  graph = 'igraph')
igraph
network
Create random bipartite graph
snafun
snafun::create_bipartite(
  n_type1,                        # number of vertices of type 1
  n_type2,                        # number of vertices of type 2
  strategy = c("gnp", "gnm"),
  p,                              # probability of each cross-type edge
  m,                              # number of cross-type edges
  directed = FALSE,
  mode = c("out", "in", "all"),
  graph = c("igraph", "network")
)
igraph
igraph::sample_bipartite()
network2
network::network.bipartite()
Create graph object from input data
snafun
#  `x` can be: 
#   - an edgelist (in data.frame format)
#   - an incidence matrix (in matrix format)--for a bipartite graph
#   - an adjacency matrix (in matrix format)
#   - `vertices`  can be a data.frame containing vertex attributes

snafun::to_igraph(x, bipartite = FALSE, vertices = NULL)

snafun::to_network(x, bipartite = FALSE, vertices = NULL)
igraph
# from an adjacency matrix
igraph::graph_from_adjacency_matrix(adjmatrix, 
  mode = c("directed", "undirected", "max", "min", "upper", "lower", "plus"),
  weighted = NULL, diag = TRUE, add.colnames = NULL, add.rownames = NA)

igraph::make_graph(edges, ..., n = max(edges), isolates = NULL, directed = TRUE, 
  dir = directed, simplify = TRUE)

# from an adjacency list
igraph::graph_from_adj_list(adjlist, mode = c("out", "in", "all", "total"), 
  duplicate = TRUE)

# from an edgelist
igraph::graph_from_edgelist(el, directed = TRUE)

# from a data.frame
igraph::graph_from_data_frame(d, directed = TRUE, vertices = NULL)

# from an incidence matrix (in matrix format)
igraph::graph_from_incidence_matrix(incidence, directed = FALSE, 
  mode = c("all", "out", "in", "total"), multiple = FALSE,
  weighted = NULL, add.names = NULL)
network
# x can be an adjacency matrix (as a matrix), an incidence matrix (as a matrix), 
# or an edgelist (as a data.frame)
network::network(x, vertex.attr = NULL, vertex.attrnames = NULL, directed = TRUE,
  hyper = FALSE, loops = FALSE, multiple = FALSE, bipartite = FALSE, ...)

# if x is a data.frame
network::as.network(x, directed = TRUE, vertices = NULL, hyper = FALSE, 
  loops = FALSE, multiple = FALSE, bipartite = FALSE, 
  bipartite_col = "is_actor", ...)

# if x is a matrix
network::as.network(x, matrix.type = NULL, directed = TRUE, hyper = FALSE, 
  loops = FALSE, multiple = FALSE, bipartite = FALSE, ignore.eval = TRUE,
  names.eval = NULL, na.rm = FALSE, edge.check = FALSE, ...)
1 The output is a matrix object, not a 'network' object.
2 This function is quite unintuitive and cumbersome


2.1 Additional graph creation functions

The snafun package offers a few functions that assist with the manipulation of graph data in R.

  • snafun::make_edgelist(names = NULL, attribute = NULL)

The input is a data.frame (names) with edge information. The attribute is a vector that contains a node attribute for those vertices.

The function returns a vector or data.frame that can be read into igraph or network.

  • snafun::make_nodelist(names = NULL, attribute = NULL)

The input is a data.frame (names) with edge information. The attribute is another data.frame that contains the values of those edges.

The function returns an edgelist that can be read into igraph or network.




3 Converting between graph classes

As mentioned above, a common network analysis workflow includes the conversion of graph objects between various formats. The igraph and statnet packages provide some very basic functions for this, that are specific to the types of objects that are used in these packages themselves:

Graph object conversion
Convert to an adjacency matrix
igraph
igraph::as_adjacency_matrix(g, sparse = FALSE)
network
network::as.sociomatrix(g)
network::as.matrix.network(g)
Convert to an adjacency list
igraph
igraph::as_adj_list(g)
network
Convert to an edge list
igraph
igraph::as_edgelist(g)
igraph::as_data_frame(g)
network
network::as.data.frame.network(g)
network::as.edgelist(g)

# if you want to include edge weights
sna::as.edgelist.sna(g)
Make the network directed
snafun
igraph
igraph::as.directed(g)
network
Make the network undirected
snafun
# from an adjacency matrix as input
snafun::to_symmetric_matrix(g,
  rule = c("weak", "mutual", "out", "in", "average", "max", "min"),
  na.rm = TRUE)
igraph
igraph::as.undirected(g)

# for example
igraph::as.undirected(g, mode = 'collapse', edge.attrib.comb = list(weight = 'sum'))
network
# from an adjacency matrix as input
sna::symmetrize(g)

# for example
sna::symmetrize(g, rule = "strong")
Project a bipartite graph
snafun
igraph
igraph::bipartite_projection(g)
network
Convert to a line graph
snafun
igraph
igraph::make_line_graph(g)
network

Conversion between various formats is a lot easier with the help of the snafun package. You only need to know some very easy function names to convert between the common graph classes. Here is an overview of which function to use for which conversion:

Convert between various formats
Using the consistent functions of snafun
INPUT OUTPUT
edgelist matrix igraph network
edgelist to_edgelist() to_matrix() to_igraph() to_network()
matrix to_edgelist() to_matrix() to_igraph() to_network()
igraph to_edgelist() to_matrix() to_igraph() to_network()
network to_edgelist() to_matrix() to_igraph() to_network()
Conversion includes bipartite graphs.

Check the arguments for the various options.

4 Manipulating the graph object

Once you have a network object, you will want to explore it and access parts of it or extract info from it.

Here is a table with some of the functions you’ll use all the time in any SNA project.

Manipulate the graph object
Print the graph object
snafun
snafun::print(x)
igraph1
igraph::print.igraph(x)
network1
network::print.network(x)
Check characteristics of the graph object
snafun
snafun::has_edge_attributes(x)

snafun::has_vertex_attributes(x)

snafun::has_vertex_attribute(x, attrname)

snafun::has_edge_attribute(x, attrname)

snafun::has_vertexnames(x)

snafun::has_loops(x)

snafun::is_bipartite(x)

snafun::is_connected(x, rule = c("weak", "strong"))

snafun::is_directed(x)

snafun::is_network(x)

snafun::is_igraph(x)

snafun::is_signed(x)

snafun::is_weighted(x)
igraph
# various functions, including
igraph::any_loop(g)
igraph::any_multiple(g)
igraph::is_bipartite(graph)
igraph::is_connected(graph, mode = c("weak", "strong"))
igraph::is_directed(graph)
igraph::is_igraph(graph)
igraph::is_named(graph)
igraph::is_simple(graph)
igraph::is_weighted(graph)
network
# various functions, including:
network::has.loops(x)
network::is.bipartite(x)
network::is.directed(x)
network::is.multiplex(x)
network::is.network(x)
sna::is.connected(g, connected = "strong")
Access the vertices and edges
snafun
igraph
igraph::V(x)              # vertices
igraph::E(x)              # edges
network
Extract vertex / edge / graph attributes
snafun
snafun::extract_vertex_attribute(x, name)
snafun::extract_vertex_names(x)             # specific to access the vertex names attribute
snafun::extract_edge_attribute(x, name)
snafun::extract_graph_attribute(x, name)
igraph
igraph::vertex_attr(graph, name, index = V(graph))
igraph::V(graph)
attributenameigraph::edgeattr(graph,name,index=E(graph))igraph::E(graph)attributename
    
    igraph::edge_attr(graph, name, index = E(graph))
    igraph::E(graph)attributename

igraph::graph_attr(graph, name)
network2
network::get.vertex.attribute(x, attrname)
network::get.edge.attribute(x, attrname)
network::get.network.attribute(x, attrname)
igraph
network
Extract the ID of an edge
snafun
snafun::extract_edge_id(object, ego, alter, edgelist)
Extract vertex names
snafun
snafun::extract_vertex_names(x)
igraph
igraph::V(x)$name
igraph::vertex_attr(x, name = "names")
network
network::get.vertex.attribute(x, "vertex.names")
network::network.vertex.names(x)
set vertex names
snafun
snafun::add_vertex_names(x, value)
igraph
igraph::V(x)$name <- value
igraph::set_vertex_attr(x, name = "names")
network
network::set.vertex.attr(x, "vertex.names", value)
network::network.vertex.names(x) <- value
List vertex / edge / graph attributes
snafun
snafun::list_vertex_attribute(x, name)
snafun::list_edge_attribute(x, name)
snafun::list_graph_attribute(x, name)
igraph
igraph::vertex_attr_names(graph)
igraph::edge_attr_names(graph)
igraph::graph_attr_names(graph)
network
network::list.vertex.attributes(x)
network::list.edge.attributes(x
network::list.network.attributes(x)
Extract all vertex attributes into a data.frame
snafun
snafun::extract_all_vertex_attributes(g)
igraph
network
Add vertex / edge / graph attributes
snafun
# very flexible and powerful functions to add attributes 
# in several ways, with the same function
snafun::add_edge_attributes(object, attr_name, value, edgelist, overwrite = FALSE)

snafun::add_vertex_attributes(x, attr_name = NULL, value)
snafun::add_graph_attribute(x, attr_name = NULL, value)
igraph
igraph::set_edge_attr(g, "name", value)
igraph::E(g)
name<valueigraph::setvertexattr(g,"name",value)igraph::V(g)name <- value
    
    igraph::set_vertex_attr(g, "name", value)
    igraph::V(g)name <- value

igraph::set_graph_attr(g, "name", value)
g$name <- value
network
network::set.edge.attribute(g, "new_name", value, e = seq_along(g
mel))network::set.edge.value(g,"newname",value,e=seqalong(gmel))
    network::set.edge.value(g, "new_name", value, e = seq_along(gmel))

network::set.network.attribute(g, "new_name", value)

network::set.vertex.attribute(g, "new_name", value, v = seq_len(network::network.size(g)))
Remove vertex / edge / graph attributes
snafun
snafun::remove_edge_attribute(x, attr_name)
snafun::remove_edge_weight(x)  # remove edge weights

snafun::remove_vertex_attribute(x, attr_name)
snafun::remove_vertex_names(x)   # specific for the vertex name attribute

snafun::remove_graph_attribute(x, attr_name)
igraph
igraph::delete_edge_attr(graph, name)
igraph::delete_vertex_attr(graph, name)
igraph::delete_graph_attr(graph, name)
network
network::delete.edge.attribute(x, attrname, ...)
network::delete.vertex.attribute(x, attrname, ...)
network::delete.network.attribute(x, attrname, ...)
Add vertices or edges
snafun
igraph
igraph::add_vertices(graph, nv, ..., attr = list())
igraph::add_edges(graph, edges, ..., attr = list())
network
network::add.vertices(x, nv, vattr = NULL, last.mode = TRUE, ...)
network::add.edge(x, tail, head, names.eval = NULL, vals.eval = NULL, edge.check = FALSE, ...)
network::add.edges(x, tail, head, names.eval = NULL, vals.eval = NULL, ...)
Remove vertices or edges
snafun
snafun::remove_vertices(x, vertices)
igraph
igraph::delete_vertices(graph, v)
igraph::delete_edges(graph, edges)
network
network::delete.edges(x, eid)
network::delete.vertices(x, vid)
Contract vertices into one
snafun
snafun::contract_vertices(g, vertices, method = c("min", "max", "union", "add"),
  attr = NULL, new_name = "set", out_as_igraph = TRUE)
igraph
network
Extract a subset from the graph
snafun
# single function to do it all
snafun::extract_subgraph(x, v_to_keep, e_to_keep)
igraph
# subset based on vertices
igraph::induced_subgraph(g, vids = theVerticesYouWantToKeep)

# subset based on edges
igraph::subgraph.edges(g, eids = theEdgesYouWantToKeep)
network
# subset based on vertices
network::get.inducedSubgraph(g, v = theVerticesYouWantToKeep)

# subset based on edges
network::get.inducedSubgraph(g, eid = theEdgesYouWantToKeep)
Extract egonets from the graph
snafun
snafun::extract_egonet(x, vertices = NULL, order = 1, type = c("all", "out", "in"))
igraph
igraph::make_ego_graph(g, order = 1, nodes = V(graph), mode = c("all", "out", "in"), mindist = 0)
network
sna::ego.extract(dat, ego = NULL, neighborhood = c("combined", "in", "out"))
Clean up the graph
snafun
snafun::find_isolates(x, names = TRUE, loops = FALSE)
snafun::remove_isolates(x, loops = FALSE)
snafun::remove_loops(x)
igraph
igraph::simplify(g)

# for example
igraph::simplify(g, remove.multiple = TRUE,
    remove.loops = TRUE, edge.attrib.comb = list(weight = 'max'))
network
sna::isolates(g, diag=FALSE)
Union of graphs
snafun
snafun::make_union(net1, net2, net3, byname = 'auto')
igraph
igraph::union(net1, net2, net3)
network
Intersection of graphs
snafun
snafun::extract_intersection(net, net2, net3, byname = 'auto', keep.all.vertices = TRUE)
igraph
igraph::intersection(net, net2, net3)
network
1 Often, but certainly not always, it will also work if you just use `print(x)`
2 These functions have many additional arguments

5 Graph level indices

Below you will find a table to determine the foundational indices of a graph at the graph-level.

Explore the graph
Summarize the network
snafun
snafun::g_summary(x, directed = TRUE)
igraph
network
Count the number of vertices
snafun
snafun::count_vertices(x)
igraph
igraph::vcount()
network
network::network.size()
Count the number of edges
snafun
snafun::count_edges(x)
igraph
igraph::ecount(x)
igraph::gsize(x)
network
network::network.edgecount(x)
Density
snafun
snafun::g_density(x, loops = FALSE)
igraph
igraph::edge_density(g)
network
# preferable for biprartite graphs
network::network_density(g)

# preferable for valued graphs
sna::gden(g)
Reciprocity
snafun
snafun::g_reciprocity(x)
igraph
igraph::reciprocity(g)
network
sna::grecip(g,'measure = 'edgewise')
Transitivity
snafun
snafun::g_transitivity(x)
igraph
igraph::transitivity(g, type = 'global')
network
sna::gtrans(g, mode = 'digraph', measure = 'weak', use.adjacency = TRUE)
Mean distance between vertices
snafun
snafun::g_mean_distance(x)
igraph
igraph::mean_distance(g, directed = TRUE, unconnected = TRUE)
network
Degree distribution
snafun
snafun::g_degree_distribution(x, mode = c("out", "in", "all"),
                              type = c("density", "count"),
                              cumulative = FALSE, 
                              loops = FALSE,
                              digits = 3)
igraph
igraph::degree_distribution(graph, cumulative = FALSE, mode = 'out')
network
Dyad census
snafun
snafun::count_dyads(x, echo = TRUE)
igraph
igraph::dyad_census(g)
network
sna::dyad.census(g)
Triad census
snafun
snafun::count_triads(x, echo = TRUE)
igraph
igraph::triad_census(g)
network
sna::triad.census(g, mode = 'digraph')
Degree assortativity
snafun
igraph
igraph::assortativity_degree(g, directed = TRUE)
network
Diameter
snafun
snafun::g_diameter(x, directed = is_directed(x), unconnected = TRUE)
igraph
igraph::diameter(g, directed = TRUE, unconnected = TRUE)

# what is the vertex pair with the longest geodesic
igraph::farthest_vertices(g)
network
Radius
snafun
snafun::g_radius(x, mode = c("all", "out", "in"))
igraph
igraph::radius(graph, mode = c("all", "out", "in", "total"))
network
Compactness
snafun
snafun::g_compactness(x, mode = c("out", "in", "all"))
igraph
network
Centralization
snafun
# a single function to calculate centralization (either `Freeman` or `sd`)
# on a wide range of centrality indices

snafun::g_centralize(x, measure = "betweenness",
  directed = TRUE, mode = c("all", "out", "in"),
  k = 3, damping = 0.85, normalized = TRUE,
  method = c("freeman", "sd"))
igraph1,2
# general function for Freeman centralization
igraph::centralize(scores, theoretical.max = 0, normalized = TRUE)
 
# betweenness centralization
igraph::centr_betw(g, directed = TRUE)

# closeness centralization
igraph::centr_clo(g, mode = 'out', normalized = FALSE)
 
# degree centralization
igraph::centr_degree(g, mode = 'all')

# eigenvector centralization
igraph::centr_eigen(g, directed = TRUE)
network1
# general function for Freeman centralization,
# include any function that calculates vertex centrality
sna::centralization(g, FUN, mode = 'digraph', normalize=TRUE, ...)
Mixing matrix
snafun
snafun::make_mixingmatrix(x, attrname, by_edge = FALSE, loops = has_loops(x))
igraph
network
network::mixingmatrix(object, attrname, useNA = "ifany", expand.bipartite = FALSE)
Correlation between two graphs
snafun
snafun::g_correlation(g1, g2, diag = FALSE)
igraph
network
sna::gcor(g1, g2, mode = 'graph')
Fast-greedy community detection
snafun
snafun::extract_comm_fastgreedy(x, weights = NA, modularity = TRUE, 
  merges = TRUE, membership = TRUE)
igraph
igraph::cluster_fast_greedy(g, merges = TRUE, modularity = TRUE,
  membership = TRUE, weights = NULL)
network
Girvan-Newman community detection
snafun
snafun::extract_comm_girvan(x, weights = NA, directed = TRUE, modularity = TRUE,
  edge.betweenness = FALSE, bridges = FALSE, merges = TRUE, membership = TRUE)
igraph
igraph::cluster_edge_betweenness(g, weights = NULL, directed = TRUE, edge.betweenness = TRUE,
  merges = TRUE, bridges = TRUE, modularity = TRUE, membership = TRUE)
network
Louvain community detection
snafun
snafun::extract_comm_louvain(x, weights = NA, resolution = 1)
igraph
igraph::cluster_louvain(graph, weights = NULL, resolution = 1)
network
Walktrap community detection
snafun
snafun::extract_comm_walktrap(x, weights = NA, steps = 4, modularity = TRUE,
  merges = TRUE, membership = TRUE)
igraph
igraph::cluster_walktrap(g, weights = NULL, steps = 4, merges = TRUE, modularity = TRUE, 
  membership = TRUE)
network
Merge community membership
snafun
snafun::merge_membership(coms, merges)
igraph
network
1 igraph and sna only calculate `Freeman` centralization (snafun does both `Freeman` and `sd`)
2 `$res` or `$vector` return the centrality scores

5.1 Communities and other subgroups

It is informative to know that the snafun functions to extract communities yield results that can be scrutinized by igraph. The snafun package is smart enough to do this, regardless of whether the original input graph was of class igraph or network. Really handy.

Here’s an example.

# generate a random directed graph with 20 vertices and 30 edges
g <- snafun::create_random_graph(20, "gnm", m = 30)

# determine the walktrap communities
walk <- snafun::extract_comm_walktrap(g)
print(walk)
#> IGRAPH clustering walktrap, groups: 6, mod: 0.27
#> + groups:
#>   $`1`
#>   [1]  2  6  7  8 10 11 15 18
#>   
#>   $`2`
#>   [1]  1 13 16 20
#>   
#>   $`3`
#>   [1]  3  4  9 14
#>   
#>   $`4`
#>   + ... omitted several groups/vertices

# get the modularity score
igraph::modularity(walk)
#> [1] 0.2694444

# who is member of which community
igraph::communities(walk)
#> $`1`
#> [1]  2  6  7  8 10 11 15 18
#> 
#> $`2`
#> [1]  1 13 16 20
#> 
#> $`3`
#> [1]  3  4  9 14
#> 
#> $`4`
#> [1] 12 19
#> 
#> $`5`
#> [1] 5
#> 
#> $`6`
#> [1] 17

# which community is a vertex member of
igraph::membership(walk)
#>  [1] 2 1 3 3 5 1 1 1 3 1 1 4 2 3 1 2 6 1 4 2

# number of communities
length(walk)
#> [1] 6

# size of each community
igraph::sizes(walk)
#> Community sizes
#> 1 2 3 4 5 6 
#> 8 4 4 2 1 1

# which edge connects multiple communities
igraph::crossing(walk, g)
#>  [1]  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
#> [13] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
#> [25] FALSE FALSE  TRUE FALSE FALSE  TRUE

# plot the network, highlighting the communities
plot(walk, g)

If you are so inclined, you can plot the community division as a dendrogram, as follows:

snafun::plot_comm_dendrogram(walk)

6 Vertex-level indices

Here are the functions to determine many of the vertex-level indices you will want to use in this course.

Explore the vertices
Degree
snafun
snafun::v_degree(x, vids = NULL, mode = c("all", "out", "in"),
  loops = FALSE, rescaled = FALSE)
igraph
igraph::degree(g, mode = 'out')
network
sna::degree(g, gmode = 'digraph', cmode = 'outdegree')
Betweenness
snafun
snafun::v_betweenness(x, vids = NULL, directed = TRUE, rescaled = FALSE)
igraph
igraph::betweenness(g, directed = TRUE)
network
sna::betweenness(g, gmode = 'digraph', cmode = 'directed')
Flow betweenness
snafun
igraph
network
sna::flowbet(g, gmode = 'digraph', cmode = 'rawflow')
Bonacich power centrality
snafun
igraph
igraph::power_centrality(g)
network
sna::bonpow(g, gmode = 'digraph')
Closeness centrality
snafun
snafun::v_closeness(x, vids = NULL, mode = c("all", "out", "in"), rescaled = FALSE)
igraph
igraph::closeness(g, mode = 'all')
network
sna::closeness(g, gmode = 'digraph', cmode = 'directed')
Harmonic centrality
snafun
snafun::v_harmonic(x, vids = NULL, mode = c("all", "out", "in"), rescaled = FALSE)
igraph
igraph::harmonic_centrality(g, vids = V(graph), 
  mode = c("out", "in", "all"), weights = NULL)
network
Stress centrality
snafun
snafun::v_stress(x, vids = NULL, directed = TRUE, rescaled = FALSE)
igraph
network
sna::stresscent(g, gmode = 'digraph', cmode = 'directed')
Eccentricity
snafun
snafun::v_eccentricity(x, vids = NULL, mode = c("all", "out", "in"), rescaled = FALSE)
igraph
igraph::eccentricity(g, mode = 'all')
network
Eigenvector
snafun
snafun::v_eigenvector(x, directed = TRUE, rescaled = FALSE)
igraph
igraph::eigen_centrality(g, directed = TRUE, scale = FALSE)$vector
network
sna::evcent(g, gmode = 'digraph', rescale=FALSE)
Page rank
snafun
snafun::v_pagerank(x, vids = NULL, damping = 0.85, directed = TRUE, rescaled = FALSE)
igraph
igraph::page_rank(g, vids = V(graph), directed = TRUE, damping = 0.85, weights = NULL)
network
Geo k-path
snafun
# without using weights
snafun::v_geokpath(x, vids = NULL, mode = c("all", "out", "in"), 
  k = 3, rescaled = FALSE)

# if weights are to be used
snafun::v_geokpath_w(x, vids = NULL, mode = c("all", "out", "in"),
  weights = NULL, k = 3)
igraph
network
Shapley centrality
snafun
snafun::v_shapley(x, add.vertex.names = FALSE, vids = NULL, rescaled = FALSE)
igraph
network
Who are the neighbors of a vertex
snafun
snafun::extract_neighbors(x, vertex, type = c("out", "in", "all"))
igraph
igraph::neighbors(g, 'Jane', mode = 'out')

# all options
igraph::neighbors(graph, v, mode = c('out', 'in', 'all', 'total'))
network
network::get.neighborhood(g, 1, 'out')

# all options
network::get.neighborhood(x, v, type = c('out', 'in', 'combined'), na.omit = TRUE)
Neighborhood of a vertex1
snafun
igraph
igraph::make_ego_graph(g, order = 1, nodes = "Jane", mode = "all")

# all options
igraph::make_ego_graph(graph, order = 1, nodes = V(graph),
  mode = c("all", "out", "in"), mindist = 0)
network
sna::ego.extract(dat, ego = NULL, neighborhood = c("combined", "in", "out"))

sna::neighborhood(dat, order, neighborhood.type = c("in", "out", "total"),
  mode = "digraph", diag = FALSE, thresh = 0, return.all = FALSE, partial = TRUE)
1 These functions serve equivalent purposes, but yield quite different kinds of outputs

7 Dyad-level indices

Here are the functions for several of the dyad-level indices you will want to use in this course.

Explore the dyads
shortest path for a given set of vertics
snafun
igraph
igraph::all_shortest_paths(g, from = IDofVertex, to = igraph::V(g), mode = 'out')
network
Geodesic lengths1
snafun
snafun::d_distance(x, mode = c("all", "out", "in"))
igraph
igraph::distances(g, mode = 'out')
network
sna::geodist(g, count.paths = TRUE)$counts
Structural equivalence
snafun
snafun::d_structural_equivalence(x, weights = NA, digits = 3, suppressWarnings = TRUE)
igraph
network
sna::sedist(g, method = "correlation", mode ="digraph", diag=FALSE)
Edge betweenness
snafun
igraph
igraph::edge.betweenness(g, directed = FALSE)
network
1 The output is a table with an entry per vertex pair




8 Plotting

8.1 Plotting in snafun

The snafun package provides a plot(x) function, which allows the user to make a quick plot of a network, regardless of whether x is a graph of class network or igraph. This function wraps network::plot.network and igraph::plot.igraph and will use the default settings of these functions.

However, one can also use all of the arguments from network::plot.network and igraph::plot.igraph to make the plots nicer when wanted.

Here is an example for a network object.

g_n <- snafun::create_random_graph(10, "gnm", m = 20, graph = "network")
snafun::plot(g_n)

snafun::plot(g_n, vertex.cex = 3, vertex.col = "green", edge.lwd = 10, 
  edge.col = "darkgrey", usecurve = TRUE, edge.curve = .05, 
  arrowhead.cex = 3, displaylabels = TRUE, label.pos = 5)

And for an igraph graph object:

g_i <- snafun::create_random_graph(10, "gnm", m = 20, graph = "igraph")
snafun::plot(g_i)

snafun::plot(g_i, vertex.size = 12, vertex.color = "green", edge.width = 5, edge.curved = TRUE)

The most common use case for the function is to make as quick plot of the network, using default settings, as one usually would do in the initial phase of a study. For this, snafun provides a consistent function name.

The snafun package also contains a function to plot centrality scores of the vertices. The function and its options are specified as follows:

snafun::plot_centralities(
  net,
  measures = c("betweenness", "closeness", "degree", "eccentricity"),
  directed = TRUE,
  mode = c("all", "out", "in"),
  k = 3,
  rescaled = FALSE,
  ...
)

This yields a plot like this:

The function takes an object of class igraph or network and plots the centrality scores you select, so you can visually compare them. Make sure to pick the required value for mode (the default is “all”).

8.2 Basic plotting in igraph

The plot function alone already plots nodes and edges with default options. More sophisticated specifications need to be manually set. It works with networks of class igraph.

igraph::plot.igraph(net,
     edge.arrow.size = .2,                # edge and arrow size
     edge.color = "red",                  # edge color
     vertex.color = "blue",               # vertex filling color
     vertex.frame.color = "green",        # vertex perimeter color
     vertex.label = igraph::V(net)$label, # vertex labels
     vertex.label.cex = 0.6,              # vertex label size
     vertex.label.color = "black")        # vertex label color

8.3 Basic plotting in network

The gplot function alone already plots nodes and edges with default options. More sophisticated specifications need to be manually set. It works with networks of class network.

network::plot.network(net,
      arrowhead.cex = 0.2,     # edge and arrow size
      edge.col = 'red',        # edge color
      vertex.col = 'blue',     # vertex filling color
      vertex.border = 'green', # vertex perimeter color
      displaylabels = TRUE,    # vertex labels
      label.cex = 0.6,         # vertex label size
      label.col = 'black')     # vertex label color

8.4 Basic plotting in sna

The gplot function alone already plots nodes and edges with default options. More sophisticated specifications need to be manually set. It works with networks of class network.

sna::gplot(net,
      arrowhead.cex = 0.2,     # edge and arrow size
      edge.col = 'red',        # edge color
      vertex.col = 'blue',     # vertex filling color
      vertex.border = 'green', # vertex perimeter color
      displaylabels = TRUE,    # vertex labels
      label.cex = 0.6,         # vertex label size
      label.col = 'black')     # vertex label color

The gplot function has a few additional arguments compared to network::plot.network and is therefore slightly more flexible.




9 Statistical models

9.1 Overview table

Here is an overview of the statistical models discussed in the course.


Statistical network models
When Which approach Function
Dependent vertex attribute explained by a network weight matrix and a matrix of covariates Network autocorrelation model
snafun::stat_nam
A statistic on a single network Conditional Uniform Graph test
sna::cug.test
Association between two networks QAP
sna::qaptest
A valued dependent network explained by one or more explanatory networks QAP linear model
sna::netlm
A binary dependent network explained by one or more explanatory networks QAP logistic model
sna::netlogit
A binary or valued dependent network explained by a set of endogenous and exogenous variables Exponential random graph models
ergm::ergm


9.2 Network autocorrelation models

The network autocorrelation model is run through the snafun::stat_nam function. The implementation is appropriate for continuous dependent variables.

The basic function call is as follows:

snafun::stat_nam(
  formula,
  data = list(),
  W,
  W2 = NULL,
  model = c("lag", "error", "combined"),
  na.action,
  Durbin = FALSE,
  quiet = TRUE,
  zero.policy = TRUE,
  check_vars = TRUE
)

Here,

  • formula is a formula of the form DEP ~ EXO1 + EXO2 + EXO3 or DEP ~ . This follows the standard way of formulating formulas in R. Run ?stats::formula in your console for details.

  • data is a data.frame or a list containing the variables to be included into the model.

  • W is a matrix of the same dimension as the network, containing the weights that drive the network influence process (for the lag model) or the network disturbances process (for the error model).

  • W2 is a matrix of the same dimension as the network, containing the weights that drive the network disturbances when running a combined model.

An intercept is added to the model by default, it is best to NOT include this as a separate exogenous variable.

The weigh matrix (matrices) is (are) row-normalized by this function. If you want to run the model using non-standardized weight matrices, use the sna::lnam function instead.

The snafun package provides a useful stat_nam_summary method (that shows you an overview of the results) and a plot_nam method (that you use to check model assumptions).


9.3 Conditional Uniform graphs (CUG)

There are two methods to perform a conditional Uniform graph test.

The first is to generate the graphs manually and calculate the measures on each graph. Generation of these graphs can be done using snafun::create_random_graph (which conditions on size and density). The equivalent functions in sna are sna::rgraph and sna::rgnm (note that these generate a matrix, rather than an igraph or network graph). See the data generation table for these functions.

The second approach is to use a function that does the graph generation and computes the network measure for you. The preferred is sna::cugtest, which is specified as follows:

sna::cug.test(g, FUN, mode = c("digraph", "graph"), cmode = c("size",
    "edges", "dyad.census"), reps = 1000,
    ignore.eval = TRUE, FUN.args = list())

See the sna help function for details.

Here

  • FUN is the function that needs to be calculated on each graph

  • FUN.args contains any arguments that are required for the function you specified in FUN

  • cmode determines the type of graphs that are drawn (ie. what you condition on). The options are

    • “size”: this generates graphs with a particular size and density 0.5. You rarely want this.

    • “edges”: this conditions on a specific edge count (or an exact edge value distribution)

    • “dyad.census”: this conditions on a dyad census (or dyad value distribution)

For example, in order to test whether the transitivity in your graph g is exceptional for a network of the same size and density as in g, you would run

sna::cug.test(g, sna::gtrans, cmode = "edges")

It is wise to always explicitly tell the function whether your graph is directed or not, so a better way to specify the previous function is

sna::cug.test(g, mode = "graph", FUN = sna::gtrans,
              cmode = "edges", reps = 1000,
              FUN.args = list(mode = "graph"))

Testing the betweenness centralization of you network g could be performed as follows, again conditioning on size and density:

sna::cug.test(g,
              sna::centralization,
              FUN.arg=list(FUN = sna::betweenness),
              mode="graph",
              cmode="edges")

There is also a useful plot method for the result of the CUG test.


The huuuuuge downside of the cug.test function is that it only takes specific input graps (perferably in matrix-format) and only allows one to test with functions implemented within the sna package. However, snafun can fix that. You only need to construct a simple function with the following structure (yes, always start the function with x <- snafun::fix_cug_input(x, directed = directed), this fixes some peculiarities related to the sna::cug.test function):

trans_f <- function(x, directed = FALSE) {
  x <- snafun::fix_cug_input(x, directed = directed)
  snafun::g_transitivity(x)
}

and you can now test for transitivity with ANY graph that works with snafun::g_transitivity(x) (which includes most formats you’ll work with). Then, perform the CUG test as follows:

sna::cug.test(graph_object, 
              mode = "graph", 
              FUN = trans_f, 
              cmode = "edges", 
              reps = 200)

Et voila. It simply works.

Wanna use a more involved measure that doesn’t exist in the sna package anyway? Sure, this is how you run a CUG test on the number of walktrap communities in the graph:

walk_num <- function(x, directed = FALSE) {
  x <- snafun::fix_cug_input(x, directed = directed)
  snafun::extract_comm_walktrap(x) |> length()
}

sna::cug.test(graph_object, mode = "graph", FUN = walk_num, cmode = "edges", reps = 200)

As said, snafun puts the FUN straight into SNA.


9.4 QAP test

There are two methods to perform a QAP test.

The first is to manually permute the graph. Generation of these graphs can be done using igraph::permute or sna::rmperm. See the data generation table for these functions.

The second approach is to use a function that does the graph permutation and computes the required measure (typically a correlation) for you. The preferred is sna::qaptest, which is specified as follows:

sna::qaptest(g, FUN, reps = 1000, ...)

See the sna help function for details.

Here

  • FUN is the function that needs to be calculated after each permutation

  • ... contains any arguments that are required for the function you specified in FUN

Typically, you want to test the correlation between two graphs, as follows:

sna::qaptest(list(firstNetwork, secondNetwork),
             FUN = sna::gcor, reps = 1000,
             g1 = 1, g2 = 2)

There is a useful summary method and a plot method for the output of the function.


9.5 QAP linear regression

QAP linear regression is performed through the sna::netlm function. The function looks as follows:

sna::netlm(y, x, intercept = TRUE, mode = "digraph",
    nullhyp = "qapspp", reps = 1000)

Make sure to always set intercept = TRUE and nullhyp = "qapspp". For small networks, 1000 replications should be enough, for larger networks you should typically use a higher number (say, 2000).

As an example, this is how you specify a model where graph g is modeled as a linear function of graphs g1, g2, and g3.

mod <- sna::netlm(y = g, x = list(g1, g2, g3), intercept = TRUE,
                              nullhyp = 'qapspp', reps = 1001)
mod$names <- c("Intcpt", "Net1", "Net2", "Net3")
summary(mod)

It is wise to add the names of the networks to the output object, like you see above. That is not strictly necessary, but it makes the output of the function easier to read.


9.6 QAP logistic regression

QAP logistic regression is performed through the sna::netlogit function. The function looks as follows:

sna::netlogit(y, x, intercept = TRUE, mode = "digraph",
    nullhyp = "qapspp", reps = 1000)

Make sure to always set intercept = TRUE and nullhyp = "qapspp". For small networks, 1000 replications should be enough, for larger networks you should typically use a higher number (say, 2000).

As an example, this is how you specify a model where binary graph g is modeled as a function of graphs g1, g2, and g3.

mod <- sna::netlogit(g, list(g1, g2, g3),
                     intercept = TRUE,
                     nullhyp = "qapspp", reps = 1001)
mod$names <- c("Intcpt", "Net1", "Net2", "Net3")
summary(mod)




9.7 Terms classification for every Exponential Random Graph Model (ERGM)

Terms can be classified in six main ways.

  • Dyadic independent and dyadic dependent terms: We encounter the first one when the probability of edge formation is related to nodes properties or attributes; we encounter the second when the probability of edge formation depends on other existing edges.

  • Structural and nodal attributes terms: The first kind provides tools to understand the structure of the network per se; the second kind provides tools to explain how nodal attributes might have influenced the formation of edges.

  • Terms for directed networks and terms for undirected networks

  • Exogenous and Endogenous terms: The first one refers to terms using covariates, the second to structural terms.

  • Markovian or non-Markovian: a Markovian term measures the structure in a network neighborhood

  • Curved (geometrically weighted ) or non-curved: terms that are tweaked to improve the model stability

9.8 Binary Exponential Random Graph Model (ERGM)

An ERGM model is performed through the ergm::ergm function. The basic function call is as follows:

fit <- ergm::ergm(formula)

The formula requires the specification of a network dependent variable, and a list of terms.

9.8.5 Terms specifications

Use the argument levels within the term specification for selecting the baseline or reference category.

Example: set female as a reference category.

fit <- ergm::ergm(Net ~ edges + nodefactor('sex', levels = -(2)))

9.8.6 Searching for terms

You can look for additional terms with

search.ergmTerms(keyword, net, categories, name)

You have four arguments to help you finding terms:

  • keyword optional character keyword to search for in the text of the term descriptions. Only matching terms will be returned. Matching is case insensitive.

  • net a network object that the term would be applied to, used as template to determine directedness, bipartite, etc

  • categories optional character vector of category tags to use to restrict the results (i.e. ‘curved’, ‘triad-related’) –see categorization of terms in the manual

  • name optional character name of a specific term to return

9.8.7 Checking your data before the analysis

Before you run any exponential random graph model you must know your data by heart. Not only using descriptive network statistics, but also checking model specifications, before hitting the run button.

  • Manually check the attribute(s) (numeric, integer, categorical, ordinal)
table(snafun::extract_vertex_attribute(Net, 'sex'))
  • check mixing of categorical attributes
snafun::make_mixingmatrix(Net, "sex")
  • check model statistics.
summary(Net ~ edges + nodefactor('sex'))

This last one provides the number of observed cases under the assumptions of each term.

9.8.8 Reading results

You interpret ERGM results as logit models results. Two options:

  • Compute odd ratios for each coefficient
OR <- exp(coef)
  • Compute probability for each coefficient
P <- exp(coef) / (1 + exp(coef))
  • Compute odd ratios using the SNA4DS function

OR <- snafun::stat_ef_int(m)
  • Compute probability using the snafun function

P <- snafun::stat_ef_int(m, type = "probs")

9.8.9 Simulating networks

It is sometimes helpful to simulate networks with the same features at the one you observed in real life.

  • Simulating a network from a model
fit <- ergm::ergm(Net ~ edges)
simfit <- simulate(fit, burnin = 1e+6, verbose = TRUE, seed = 9)
  • simulate network fixing the coefficient results

RandomNet <- network::network(16,density=0.1,directed=FALSE)

sim <- simulate(~ edges + kstar(2), nsim = 2, coef = c(-1.8, 0.03),
                  basis = RandomNet,
                  control = ergm::control.simulate(
                    MCMC.burnin=1000,
                    MCMC.interval=100))
sim[[1]]

9.8.10 MCMC Diagostics

You can check the Monte Carlo Markov Chains diagnostic for your dyadic dependent model using the function:

ergm::mcmc.diagnostics(fit)

9.8.11 Goodness of Fit

You can check the goodness of fit of your model using the function

ergm::gof(fit)

You can also plot your gof output

plot(ergm::gof(fit))

or, making use of your new best friend (the snafun package):

snafun::stat_plot_gof(fit)
stat_plot_gof_as_btergm(fit)




9.9 Bipartite ERGMs

A bipartite ERGM works exactly the same way as a binary one. However, in order to make it handle data differentiating between two partitions, it is necessary to use some specially defined terms. Moreover, you will need to specify the model with more advanced settings since it is computationally more demanding.

9.9.1 Importing Bipartite ERGMs

Since you are using the same function as binary ERGM (ergm::ergm) to run the model, it is necessary to make sure that the software knows that it needs to handle a bipartite structure.

9.9.1.1 Step one: Specify the incidence matrix

The data that contains bipartite network information needs to be specified into a partition 1 X partition 2 data frame or matrix.

If 10 people attend 4 events, the incidence matrix will have dimensions 10 X 4.

9.9.1.2 Step two: Import the network as a bipartite

You can import the network as bipartite using these specifications.


BipNet <- network::network(BipData, directed = FALSE, bipartite = TRUE)

9.9.1.3 Step three: Import the attributes

You can import the attributes using this code. The attribute vector needs to contain as many elements as Partition 1 + Partition 2. However, it is unlikely to have an attribute that makes sense for both partitions at the same time.

Make sure to insert the information that concerns partition 1 and afterward the information that concerns partition 2.

Make sure to code as NA the entry for the partition for which you do not have information.

For instance, if we have one attribute for partition one in a network with 10 nodes in partition 1 and 4 in partition 2, we will have the first ten digits storing information about nodes in partition 1 and 4 NAs for partition 2.


# vertex.names <- vector of names
attrib1 <- as.character(c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, NA, NA, NA, NA))

snafun::add_vertex_attributes(BipNet, "vertex.names",  vertex.names)
snafun::add_vertex_attributes(BipNet, "attrib1",  attrib1)

9.9.1.4 Step four: Bipartite extra info

In order to make sure that the software correctly reads your bipartite network, you need to code an extra attribute.

The attribute focuses on partition one. For example, if we have 10 nodes in partition 1 we will use:


snafun::add_vertex_attributes(BipNet, "bipratite", value = rep(10, 10), v = 1:10)

Note: bipratite, yes, it is a typo, but a typo in the package. Hence make sure you misspell it; otherwise, you will get an error. (FUN FACT!)

9.9.2 Terms for Bipartite ERGMs

Bipartite ERGMs terms are provided at the same time for both partitions since it is relevant to consider the same structure from both perspectives.

Triad Census
Triad Census

You can find the full list of bipartite terms by running:


ergm::search.ergmTerms(categories = "bipartite")

They are 32 in total, so it is manageable.

9.9.3 Specifying advanced options

Since a bipartite ERGM is computationally more demanding than a regular one, you need to make sure you specify the advanced options offered by the ergm package.

9.9.3.1 Constraints

Constraints are options that allow you to set limits your simulation takes into account

For instance, you can limit the simulation setting a min and a max degree.

constraints= ~ bd(minout = 0, maxout = 7)

For instance,

m <- ergm::ergm(BipNet ~ edges  + b1factor("attr1", levels = -1) + b1star(2),
                 constraints= ~ bd(minout = 0, maxout = 7))

There are many other constraints that it is possible to use

9.9.3.2 Control

Controls are options that allow you to be aware of what your simulation is doing to a larger extent and, for this reason, to make it faster.

There are several options. For instance,

  • MCMC.burnin - ignore than many chains before starting to estimate parameters

  • MCMC.samplesize - collect that number of information from the previous state in order to inform the following one

  • seed - makes the simulation go the same way every time it is run

  • MCMLE.maxit - breaks the algorithm after that number of attempts.

For instance,

m <- ergm::ergm(BipNet ~ edges  + b1factor("attr1", levels = -1) + b1star(2),
                 constraints= ~ bd(minout = 0, maxout = 7),
                 control = ergm::control.ergm(MCMC.burnin = 5000,
                                              MCMC.samplesize = 10000,
                                              seed = 1234,
                                              MCMLE.maxit = 20))




9.10 Weighted ERGMs

A weighted network is a network where the edges express the weight or the intensity of the relationship.

It is still unimodal, but it contains more information.

In order to use weighted ergms, the network needs to be fully connected.

Different kinds of weights require different models. You need to check:

  • what kind of variable type characterizes your weights (integer, count, numeric, ordinal…)
  • what kind of distribution does your weight have

The GERGM package works with fully connected networks and (theoretically ) every kind of weight variable.

The GERGM package does not recognize either the network or the igraph classes.

You need to work with weighted adjacency matrices (NxN - squared, that has inside the weight, rather than 0s and 1s).

9.10.1 Documentation

Installation:

remotes::install_github("matthewjdenny/GERGM", dependencies = TRUE)

User manual

https://github.com/matthewjdenny/GERGM

Vignette

Run browseVignettes("GERGM")in the R console

9.10.4 Running the model

First, we specify the formula,

formula <- adjacencyMatrix ~ edges +
  sender("myCov") +
  receiver("myCov") +
  netcov(otherAdjMat) +
  mutual(alpha = .9)

Then we run the model


set.seed(5)
gergmResults <- GERGM::gergm(formula,
                      estimation_method = "Metropolis", # chose the algorithm to estimate the model
                      covariate_data = covariateData, # passing attributes on
                      number_of_networks_to_simulate = 100000, # same as ergm
                      MCMC_burnin = 10000, # same as ergm
                      thin = 1/10, # retaining only a small number of simulated Networks in the computer memory
                      transformation_type = "Cauchy") # distribution of the weight

The GERGM::gergm function automatically prints results while it runs. However, it is also possible to print them again separately.

9.10.4.1 Plotting MCMC diagnostics

GERGM::Trace_Plot(gergmResults)

9.10.4.2 Plotting the Goodnes of fit

GOF(gergmResults)

9.10.4.3 Plotting the results

GERGM::Estimate_Plot(gergmResults)

9.10.4.4 Printing a table with standard errors and coefficients

(EstSE <- rbind(t(attributes(gergmResults)$theta.coef),
                t(attributes(gergmResults)$lambda.coef)))

9.10.4.5 Significance

GERGM models use confidence intervals instead of p-values.

You can estimate the confidence interval using this formula

lower = coef - SE*(-qnorm((1 - 0.95)/2))
upper = coef + SE*(-qnorm((1 - 0.95)/2))

If the lower and the upper intervals are both negative or both positive, the coefficient is significant.

If the lower and the upper intervals have different sign, the coefficient is not significant.




9.11 ERGM for temporal networks

The TERGM (=Temporal ERGM) can be used to model a sequence of binary networks. The model is very similar to the ERGM, but the dependent variable is now a list of networks: the first element is the network at time 1, the second element is the network at time 2, et cetera.

9.11.1 How to store the data for a TERGM

You store the data needed for fitting a TERGM as follows.

What How to store
Time-varying dyadic covariates Either as a list of networks or matrices
Constant dyadic covariates Single network or matrix
Vertex level attributes As vertex attributes inside the observed network objects



9.11.2 How to fit a TERGM

You fit a TERGM with the btergm package (to be installed from CRAN). This fits the model with MPLE, using bootstrapping to derive the standard errors.

The btergm package is compatible with the ergm package and you can use the terms from that package inside btergm.

There are three groups of temporal measures you can specify: memory, delayed reciprocity, and time covariates.




Temporal effects for the ERGM
meaning btergm
memory
Positive autoregression Previous existing edges persist in a next network
 btergm::memory(type = "autoregression", lag = 1)
Dyadic stability Both previous existing and non-existing ties are carried over to the current network
 btergm::memory(type = "stability", lag = 1)
Edge innovation A non-existing previous tie becomes existent in the current network
 btergm::memory(type = "innovation", lag = 1)
Edge loss An existing previous tie is dissolved in the current network
 btergm::memory(type = "loss", lag = 1)
delayed reciprocity
reciprocity if node j is tied to node i at t = 1, does this lead to a reciprocation of that tie back from i to j at t = 2?
 btergm::delrecip(mutuality = FALSE, lag = 1)
mutuality if node j is tied to node i at t = 1, does this lead to a reciprocation of that tie back from i to j at t = 2 AND if i is not tied to j at t = 1, will this lead to j not being tied to i at t = 2? This captures a trend away from asymmetry.
 btergm::delrecip(mutuality = TRUE, lag = 1)
time covariates
time effect per se Test for a specific trend (linear or non-linear) for edge formation
 btergm::timecov(transform = function(t) t)
Time effect of a covariate Interaction effect to test whether the importance of a covariate increases or decreases over time
 btergm::timecov(x, transform = function(t) t)


Visually:


When a timecov is specified without including the value for the transform, the specification defaults to a linear trend over time. For example: timecov() (= the effect of time per se) or timecov(militaryDisputes) (= a linearly increasing or decreasing effect of militaryDisputes over time).

9.11.3 Parallel processing

The btergm package uses MPLE and that lends itself well to parallel processing. You specificy that you want to use parallel processing using the argument parallel.

Windows users can only use parallel = "snow". Other systems can use either parallel = "snow" or parallel = "multicore". The latter is probably often the better choice for non-Windows machines. Both options require that you have the parallel package installed.

If you use the parallel option, you should also specify the appropriate number of cores you want to use. Either set ncpus = 4 (for four cores) or use ncpus = parallel::detectCores() to have R recognize the number of cores automatically (this usually works well, but not always). The ncpus argument is ignored if you do not specify the parallel argument.

The default is no parallel processing.

9.11.4 Goodness-of-fit

Goodness of fit is determined by

gof_m <- btergm:::gof.btergm(m, statistics = btergm_statistics), where

m is a fitted btergm model and btergm_statistics is a vector with statistics to be included in the GoF. The default is

c(btergm::dsp,btergm::esp,btergm::deg,btergm::ideg,btergm::geodesic,btergm::rocpr,btergm::walktrap.modularity).

Of course, the more statistics you include (and the more complex the statistics), the more time it will take for the GoF calculations to finish.

Use ?btergm:::`gof,btergm-method` for more options.

The GoF object can be plotted using btergm:::plot.gof(gof_m) or, often more usefully, through snafun::stat_plot_gof(gof_m). More convenient is to use the helper function from the snafun package. This is:

gof_m <- snafun::stat_plot_gof_as_btergm(m)

By default, this includes the statistics c(btergm::esp, btergm::geodesic, btergm::deg, btergm::rocpr) in the goodness-of-fit, but you can specify other statistics if you prefer. The function returns the goodness-of-fit (in this case, in the gof_m object) and plots it as well. This prevents you from having to use the triple colon ::: for btergm:::gof.btergm and is generally more convenient. If you need the full flexibility of the btergm:::gof.btergm function, use that directly. The results between the two functions are identical.

If you include the btergm::rocpr “statistic” in your GoF, the red line is the Receiver Operating Characteristic (ROC) curve and the blue line is the Precision-Recall curve.

If you see value in the ROC or PR, you can find the arreas under the curves using gof_m$`Tie prediction`$auc.roc and gof_m$`Tie prediction`$auc.pr. Note that Precision-Recall is most appropriate for sparse networks, while ROC works well for more connected networks. Either way, the plots for the network statistics are generally much more informative compared to ROC and PR (because the ROC and PR) don’t take the dependency structure in your data into account.

In the other plots, the grey boxplots represent the distribution of the values from the observed networks, the thick black line is the median of the simulations and the dashed line is the mean of the simulations. You can manipulate how each of these are shown by using:

snafun::stat_plot_gof(gof_m, median_include = FALSE, mean_col = "red")

Note: you can also feed a fitted ERGM model to

snafun::snafun::stat_plot_gof_as_btergm(m)

and determine the goodness-of-fit for the fitted ERGM that way.

10 Temporal networks (exploration and description)

The main packages to use in this course for descriptive and exploratory analysis of temporal networks are networkDynamic to construct and manipulate temporal networks), tsna (for sna-like network measures), and ndtv (for visualization).

Edges will typically have a starting time (onset), and end time (terminus), a duration, a sender (tail), and a receiver (head). of course, edges can start and end multiple times during the observation period and can have durations of length 0 up until any positive number.

The temporal networks are of class networkDynamic.

10.1 Network generation and manipulation

  • networkDynamic::networkDynamic: construction of a temporal network. There are many ways in which you can construct a temporal network. A common way is to first construct a network that has the vertex names, any vertex static attributes, edge attributes, whether the network is directed, et cetera.
    This network is called base.net and is used by this function to extract the basic aspects of the network. Don’t worry that some values (e.g., vertex attributes) may change over time, because any temporal info you add to this function will override what is in base.net. But base.net is an excellent and efficient way to provide much data to the function about the temporal network and it more cumbersome to add that later on.
    Further, you can provide dynamic data through data.frames for vertices and for edges in several ways. Consult the help function for the details, as this vignette would become far too long otherwise.
  • as.data.frame(g) Extract the dynamic edge info from the network, as a data.frame.

Most of the functions below allow you to specify a time segment you are interested in. Typically, these include onset, terminus, length, and at. Below, we give only one example of how each function can be specified.

  • networkDynamic::list.vertex.attributes.active(g, onset = 5, terminus = 8) List the attributes of the vertices that are active in a specific time segment.

  • networkDynamic::get.vertex.attribute.active(g, "attrName", at = 1) The value for vertex attribute attrName in a specific time segment.

  • networkDynamic::list.edge.attributes.active(g, onset = 0, terminus = 49) List the attributes of the edge that are active in a specific time segment.

  • networkDynamic::get.edge.attribute.active(g, "attrName", at = 1) The value for edge attribute attrName in a specific time segment.

  • networkDynamic::network.extract(classroom, onset = 0, terminus = 1) Extract the part of the temporal network for a specific time segment.

  • networkDynamic::network.collapse(classroom, onset = 0, terminus = 1) Collapse the temporal network into a static network based on the activity within a specific time segment.

  • networkDynamic::activate.vertex.attribute, networkDynamic::activate.edge.attribute, activate.edge.value, activate.network.attribute Set or modify attributes within a specific time segment.

  • deactivate.vertex.attribute, deactivate.edge.attribute, deactivate.network.attribute Make an attribute inactive during a specific time segment.

NOTE: The functions above for accessing and setting the attributes of a networkDynamic object are not very user friendly. Luckily, you can also access and/or set attributes using the network package or the snafun package like in the network manipulation table. As long as you want to access and/or set attributes that are static, this works much easier and uses functions that you have used multiple times already in this course and should be second nature to you by now.

10.2 Network measures and descriptives

  • networkDynamic::duration.matrix(g, changes, start, end) This function takes a given temporal network g, a matrix with columns “time”, “tail”, “head” (this matrix is called a toggle list), and a start and end time. It returns a data.frame a list of edges and activity spells. A toggle represents a switch from active state to inactive, or vice-versa.

  • network.size(g, onset = 5, length = 10). The size of a network during a specific time segment.

The following functions provide useful descriptives of durations in the temporal network.

  • tsna::edgeDuration(g, mode = "duration") or tsna::edgeDuration(g, mode = "counts") Sums the activity duration or number of edge events in a time segment.

  • tsna::vertexDuration(g, mode = "duration") or tsna::vertexDuration(g, mode = "counts") Sums the activity duration or number of vertex events in a time segment.

  • tsna::tiedDuration(g, mode = "duration") Measures the total amount of time each vertex has ties.

  • tsna::tiesDuration(g, mode = "counts") Computes the total number of edge spells each vertex is tied by.

The functions tsna::tEdgeFormation and tsna::tEdgeDissolution compute the number of edges forming or dissolving at time points over a time segment. If result.type = 'fraction' the fraction of the number of edges formed (or dissolved) is computed.

  • tsna::tEdgeFormation(g, start = 1, end = 4, time.interval = 1)

    Counts at times 1, 2, 3, and 4.

  • tsna::tEdgeDissolution(g, start = 1, end = 4, time.interval = 1)

    Counts at times 1, 2, 3, and 4.

10.2.1 Calculating measures from sna over time

You can calculate any measure from the sna package on a collapsed time segment or a series of collapsed time segments through the tsna::tSnaStats function. These measures can be vertex level statistics (e.g., sna:betweenness) or graph-level measures (e.g., sna::grecip). You specify which function you want to calculate and the time segments they should be calculated on. The function returns a time series, which makes the outcomes easy to plot.
For example, you want to calculate transitivity of intervals that are 5 time points wide. The following function calculates transitivity for time intervals [0-5), [5-10), [10-15), etc:

tsna::tSnaStats(g, snafun = "gtrans", time.interval = 5, aggregate.dur = 5)

This can cause some sudden shifts of values, so it is often more informative to use overlapping segments. So, let us calculate density for windows of width 0, at intervals of 3. This calculates density for intervals 0-10, 3-13, 6-16, et cetera:

tsna::tSnaStats(g, snafun = "gden", time.interval = 3, aggregate.dur = 10)

10.2.2 Calculating ergm terms over time

The tsna also allows you to compute ergm terms for specific time segments. Because the model terms provided by the ergm package (and its various add-ons) are ‘change statistics’ (that determine the effect of changing a single tie on the overall network structure), you can use these terms to describe the network within specific time segments. You specify which terms you want to calculate using a formula.

For example,

tsna::tErgmStats(g,'~edges + degree(c(1, 2))', start = 3, end = 10)

calculates the number of edges (edges) and the values for degree(1) and degree(2 for each specified time segment. The output is a time series (with a column for each statistic) and can simply be plotted using plot. This plots the time series for each term above the others, so you can see how all of them develop over time.

data(windsurfers, package = "networkDynamic")

plot(tsna::tErgmStats(windsurfers,'~edges + degree(2) + kstar(3)',
                      aggregate.dur = 5), main = "ERGM terms over time")

In the lecture, we discussed participation shifts–also known as p-shifts. Gibson (2003) defined 13 P-shifts, and the tsna::pShiftCount function can count how often each type occurs in a specific time segment. This is how Gibson describes each of the thirteen types:

knitr::include_graphics("pshifts.png")

10.2.3 Participation shifts

  • tsna::pShiftCount(g, start = 1, end = 3) Calculates the number of times each of the above P-shifts occurred during the specified time segment. In other words, this calculates the P-shift census.

10.2.4 Temporal paths

The tsna::tPath function calculates the set of temporally reachable vertices from a given source vertex starting at a specific time.

  • tsna::tPath(g, v = 12, direction = "fwd", start = 0, end = 3) This calculates the temporal paths from vertex 12 to all other vertices, from the start of the specified time segment. When direction = "bkwd", it determines the paths to vertex 12. You can further specify whether you will find the paths that arrive the first or the ones that leave the vertex at the latest possible times.

The generally most relevant parts of the resulting object are:

  • tdist The time each specific path takes. When a path does not exist, the value if Inf.

  • gsteps The length of the path (in terms of the number of steps). When a path does not exist, the value if Inf.

The tsna::plotPaths plots the network and highlights the calculated temporal paths from the chosen vertex (vertex 12, in the example above). It can also add a label to each edge, so you can see how much time it takes for that edge to be activated from this focal vertex. You can tweak the plot like you would tweak any network plot of class network.

tsna::plotPaths(
  g,
  paths = tsna::tPath(g, v = 12, direction = "fwd", start = 0, end = 3),
  displaylabels = FALSE,        # remove the vertex labels, to prevent too much visual clutter
  vertex.col = "white",
  edge.label.cex = 1.5          # the color of the printed times
)

A related concept is that of “temporal reachability.” The tsna::tReach function computes, for each vertex, the number of vertices that are temporally reachable over the entire observation period.<> If you want to compute this for a specific time segment, first use networkDynamic::network.extract to extract the segment of interest and then feed this to the tsna::tReach function.

  • tsna::tReach(g, direction = "fwd", start = 10, end = 20) The function to calculate the temporal reachable sets using only temporally forward steps (you can also specify direction = "bkwd" to determine by how many vertices each vertex can be temporally reached).

10.3 Network visualization

Temporal networks can be visualized in two ways. First, static plots can be made of a temporal network, either by collapsing the temporal network into a static network (or to break up the temporal network into static networks of specific time segments).

10.3.1 Visualizing as static networks

An obvious way to visualize the entire temporal network as a static network is to simply use plot(g).

Alternatively, the temporal network can be collapsed into smaller time segments and plot these the network slices as static representations.

There are two functions that can do this. The ndtv package has the

ndtv::filmstrip that does this as follows:

  • ndtv::filmstrip(g, frames = 9) This plots the network at 9 points in time. It does not provide an overview of how the network changes over time, but it provides a series of snapshots (9, in this example) of the network. If the timing of the edges is in continuous time, this function has the tendency to plot nearly empty graphs, as it evaluates the networks at specific time points, rather than time intervals.

The snafun package implements a function that divides the specified time period into time segments of equal time length and plots each segment as a static network. This is useful to see how the network changes over time. It also works nicely for networks where changes happen in continuous time.

snafun::plot_network_slices(9, number = 93)

A sometimes useful function is ndtv::proximity.timeline, which shows the distance between the edges over time. The main purpose is to see how the edges move vis-a-vis each other over time (based on the geodesic path distance) and it often helps to see where and when subgroups are forming over time.

The function call is:

ndtv::proximity.timeline(g,  start = 10, end = 50,
                         time.increment = .5,
                         mode = 'isoMDS')

where you can change the mode to a different scaling algorithm. For actual research projects, you want to try various settings and check which gives you the most informative output for the data at hand.

The function allows you to set many arguments (such as labels and colors).

10.3.2 Visualizing as a dynamic network animation

The ndtv package includes functions to create an animation of how the network unfolds over time. There are many arguments you can tweak, so here we only focus on the main approach. Make sure to consult the package help for more details.

There are two steps in creating a dynamic visualization in ndtv: you first run ndtv::compute.animation, which determines coordinates and other aspects of the dynamic plot. Second, you run ndtv::render.d3movie, which, you guessed it, renders the actual movie.

# step 0: unfortunately, we have to load the package into our session
library(ndtv)

# step 1: compute the settings
ndtv::compute.animation(g, animation.mode = "kamadakawai",
                        slice.par = list(start = 0, end = 45,
                                         aggregate.dur = 1,
                                         interval = 1, rule = "any"))

# step 2, render the animation
ndtv::render.d3movie(g, usearrows = TRUE, displaylabels = FALSE ,
                     bg = "#111111",
                     edge.col = "#55555599",
                     render.par = list(tween.frames = 15,
                                       show.time = TRUE),
                     d3.options = list(animationDuration = 1000,
                                       playControls = TRUE,
                                       durationControl = TRUE),
                     output.mode = 'htmlWidget'
                     )

Some important arguments for ndtv::render.d3movie include:

  • launchBrowser: defaults to TRUE: determines whether the animation will be shown in the Browser after rendering.

  • output.mode: the kind of output you want (defaults to ‘HTML’)

  • filename: The file name of the HTML or JSON file to be generated. Only relevant if you picked ‘HTML’ or ‘JSON’ as output.mode.

Further, you can set most of the common graphical parameters, such as vertex.col, label.cex, use.arrows, edge.lwd, et cetera.

If you want to fix vertices to the same location throughout the animation, you do this as follows

# use some way to determine a matrix of vertex locations
coords <- ndtv::network.layout.animate.kamadakawai(g)

# add the x and y coordinates as vertex attributes
# adapt onset and terminus if required
networkDynamic::activate.vertex.attribute(g, "x", coords[, 1],
                                          onset = -Inf, terminus = Inf)

networkDynamic::activate.vertex.attribute(g, "y", coords[, 2],
                                          onset = -Inf, terminus = Inf)

# compute the new animation settings
# We now use `animation.mode = "useAttribute"`
ndtv::compute.animation(g, animation.mode = "useAttribute",
                        slice.par = list(start = 0, end = 45,
                                         aggregate.dur = 1,
                                         interval = 1, rule = "any"))